To address the problem of lack of labeled data in low-resource languages, which prevents the use of existing mature deep learning methods for Named Entity Recognition (NER), a cross-lingual NER model based on sentence-level Generative Adversarial Network (GAN), namely SLGAN-XLM-R (Sentence Level GAN based on XLM-R), was proposed. Firstly, the labeled data of the source language was used to train the NER model on the basis of the pre-trained model XLM-R (XLM-Robustly optimized BERT pretraining approach). At the same time, the linguistic adversarial training was performed on the embedding layer of XLM-R model by combining the unlabeled data of the target language. Then, the soft labels of the unlabeled data of the target language were predicted by using the NER model, Finally the labeled data of the source language and the target language was mixed to fine-tune the model again to obtain the final NER model. Experiments were conducted on four languages, English, German, Spanish, and Dutch, in two datasets, CoNLL2002 and CoNLL2003. The results show that with English as the source language, the F1 scores of SLGAN-XLM-R model on the test sets of German, Spanish, and Dutch are 72.70%, 79.42%, and 80.03%, respectively, which are 5.38, 5.38, and 3.05 percentage points higher compared to those of the direct fine-tuning on XLM-R model.
With the recent development of mobile communication technology and the popularization of smart devices, the computation-intensive tasks of the terminal devices can be offloaded to edge servers to solve the problem of insufficient resources. However, the distributed nature of computation offloading technology exposes terminal devices and edge servers to security risks. And, blockchain technology can provide a safe environment transaction for the computation offloading system. The combination of the above two technologies can solve the insufficient resource and the security problems in internet of things. Therefore, the research results of applications combining computation offloading with blockchain technologies in internet of things were surveyed. Firstly, the application scenarios and system functions in the combination of computation offloading and blockchain technologies were analyzed. Then, the main problems solved by blockchain technology and the key techniques used in this technology were summarized in the computation offloading system. The formulation methods, optimization objectives and optimization algorithms of computation offloading strategies in the blockchain system were classified. Finally, the problems in the combination were provided, and the future directions of development in this area were prospected.
Non-Intrusive Load Monitoring (NILM) technology provides technical support for demand side management, and non-intrusive load identification is the key link in the process of load monitoring. The long-term sampling process of load data cannot be carried out in real time and high frequency, and the time sequence of the obtained load data is lost. At the same time, the defect of insufficient representation of low-level signal features occurs in Convolution Neural Network (CNN). In view of the above two problems, a CNN based non-intrusive load identification algorithm with upsampling pyramid structure was proposed. In the proposed algorithm, with direct orientation to the collected load current signals, the time sequence of the data was compensated by the relevant information in the time dimension of the upsampling network expanded data, and the high-level and low-level features of load signals were extracted by the bidirectional pyramid one-dimensional convolution, so that the load characteristics were fully utilized. As a result, the purpose of identifying unknown load signals can be achieved. Experimental results show that the recognition accuracy of non-intrusive load identification algorithm based on CNN with upsampling pyramid structure can reach 95.21%, indicating that the proposed algorithm has a good generalization ability, and can effectively realize load identification.
With the rapid development of cloud computing technology, the number of data centers have increased significantly, and the subsequent energy consumption problem gradually become one of the research hotspots. Aiming at the problem of server energy consumption optimization, a data center server energy consumption optimization combining eXtreme Gradient Boosting (XGBoost) and Multi-Gated Recurrent Unit (Multi-GRU) (ECOXG) algorithm was proposed. Firstly, the data such as resource occupation information and energy consumption of each component of the servers were collected by the Linux terminal monitoring commands and power consumption meters, and the data were preprocessed to obtain the resource utilization rates. Secondly, the resource utilization rates were constructed in series into a time series in vector form, which was used to train the Multi-GRU load prediction model, and the simulated frequency reduction was performed to the servers according to the prediction results to obtain the load data after frequency reduction. Thirdly, the resource utilization rates of the servers were combined with the energy consumption data at the same time to train the XGBoost energy consumption prediction model. Finally, the load data after frequency reduction were input into the trained XGBoost model, and the energy consumption of the servers after frequency reduction was predicted. Experiments on the actual resource utilization data of 6 physical servers showed that ECOXG algorithm had a Root Mean Square Error (RMSE) reduced by 50.9%, 31.0%, 32.7%, 22.9% compared with Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) network, CNN-GRU and CNN-LSTM models, respectively. Meanwhile, compared with LSTM, CNN-GRU and CNN-LSTM models, ECOXG algorithm saved 43.2%, 47.1%, 59.9% training time, respectively. Experimental results show that ECOXG algorithm can provide a theoretical basis for the prediction and optimization of server energy consumption optimization, and it is significantly better than the comparison algorithms in accuracy and operating efficiency. In addition, the power consumption of the server after the simulated frequency reduction is significantly lower than the real power consumption, and the effect of reducing energy consumption is outstanding when the utilization rates of the servers are low.
Concerning that the traditional Web text clustering algorithm without considering the Web text topic information leads to a low accuracy rate of multi-topic Web text clustering, a new algorithm was proposed for Web text clustering based on the topic theme. In the method, multi-topic Web text was clustered by three steps: topic extraction, feature extraction and text clustering. Compared to the traditional Web text clustering algorithm, the proposed method fully considered the Web text topic information. The experimental results show that the accuracy rate of the proposed algorithm for multi-topic Web text clustering is higher than the text clustering method based on K-means or HowNet.
To improve the accuracy of recommended Web resources, a personalized recommendation algorithm based on ontology, named BO-RM, was proposed. Subject extraction and similarity measurement methods were designed, and ontology semantic was used to cluster Web resources. With a user's browser tracks captured, the tendency of preferences and recommendation were adjusted dynamically. Comparison experiments with collaborative filtering algorithm based on situation named CFR-RM and personalized prediction algorithm based on model were given. The results show that BO-RM has relatively stable overhead time and good performance in Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP). The results prove that BO-RM improves the efficiency by using offline data analysis for large Web resources, thus it is practical. In addition, BO-RM captures the users' interest in real-time to updates the recommendation list dynamically, which meets the real needs of users.